feat(scaler): add observability (metrics + tracing) to the external scaler by Fedosin · Pull Request #1634 · kedacore/http-add-on

Fedosin · 2026-05-13T15:14:19Z

Add OpenTelemetry-based metrics and distributed tracing to the external
scaler component, which previously had no observability instrumentation.

Shared observability infrastructure is extracted into pkg/observability/
so both the interceptor and scaler reuse the same tracing setup, metrics
provider, and configuration types.

Metrics:

scaler.pinger.fetch.duration (histogram) — duration of each queue pinger fetch cycle
scaler.pinger.fetch.errors (counter) — total failed pinger fetch cycles
scaler.pinger.endpoints (gauge) — number of interceptor endpoints being polled
Prometheus /metrics endpoint on port 2223 (configurable via OTEL_PROM_EXPORTER_PORT)
Optional OTLP HTTP metrics export (via OTEL_EXPORTER_OTLP_METRICS_ENABLED)

Tracing:

OTEL tracing SDK with console, HTTP/protobuf, and gRPC exporters
otelgrpc stats handler for automatic gRPC server span instrumentation
W3C TraceContext + Baggage propagation

Configuration env vars (same as the interceptor):

OTEL_PROM_EXPORTER_ENABLED (default: true)
OTEL_PROM_EXPORTER_PORT (default: 2223)
OTEL_EXPORTER_OTLP_METRICS_ENABLED (default: false)
OTEL_EXPORTER_OTLP_TRACES_ENABLED (default: false)
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL (default: console)

Checklist

Commits are signed with Developer Certificate of Origin (DCO)
Changelog has been updated and is aligned with our changelog requirements
Any necessary documentation is added, such as:
- README.md
- keda-docs

Part of #965

snyk-io · 2026-05-13T15:14:48Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copilot

Pull request overview

This PR adds OpenTelemetry-based observability to the external scaler by introducing metric instruments/exporters and distributed tracing, plus wiring them into the scaler’s startup and queue polling logic.

Changes:

Added an OTEL metrics provider + instruments for queue pinger fetch duration/errors and endpoint count, with Prometheus and optional OTLP/HTTP export.
Added OTEL tracing SDK setup and enabled automatic gRPC server span instrumentation via otelgrpc when tracing is enabled.
Wired metrics/tracing configuration into scaler config and main startup, including a /metrics HTTP endpoint.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
scaler/tracing/tracing.go	Adds OTEL tracing SDK setup and exporter selection for the scaler.
scaler/metrics/provider.go	Introduces an OTEL `MeterProvider` with Prometheus and optional OTLP metric export.
scaler/metrics/instruments.go	Defines metric instruments and recording helpers for queue pinger metrics.
scaler/queue_pinger.go	Records pinger fetch metrics on each polling cycle.
scaler/queue_pinger_test.go	Updates pinger construction in tests for the new instruments parameter.
scaler/main.go	Initializes metrics/tracing, adds Prometheus `/metrics` server, and instruments gRPC when enabled.
scaler/config.go	Adds env-configurable metrics and tracing settings for the scaler.
go.mod	Adds `otelgrpc` dependency for gRPC tracing instrumentation.
go.sum	Updates sums for added/updated dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

linkvt

I left a few comments, we can probably reduce the size quite a bit after deduplicating code we already have in the interceptor.

The PR description should also probably not say "Fixes #..." to avoid auto closing the issue.
We should also keep in mind to update the helm chart and the resources in config/ in this repo.

linkvt · 2026-05-19T07:49:31Z

 	if err != nil {
 		setupLog.Error(err, "Kubernetes client config not found")
-		os.Exit(1)
+		runtime.Goexit()


Is there a reason for using runtime.Goexit now? If this is intended we should probably also add the defer os.Exit(1) at top as in the interceptor to also stop the grpc server etc after runtime.Goexit has been called.

Fixed — added defer os.Exit(1) at the top of main (same pattern as the interceptor). All subsequent failures use runtime.Goexit() to ensure defers run.

linkvt · 2026-05-19T07:51:04Z

+	AttrNamespace = "namespace"
+	AttrService   = "service"


unused vars

linkvt · 2026-05-19T07:53:30Z

@@ -0,0 +1,91 @@
+package tracing


This looks like a copy of interceptor/tracing/tracing.go, we should deduplicate this code

Deduplicated — extracted shared tracing setup into pkg/observability/tracing.go. Both interceptor and scaler now delegate to it.

linkvt · 2026-05-19T07:54:48Z

@@ -0,0 +1,51 @@
+package metrics


This looks like a copy of interceptor/provider/metrics.go, we should deduplicate this code.

Deduplicated — extracted shared meter provider factory into pkg/observability/metrics.go. Both interceptor and scaler delegate to it with their respective service names.

linkvt · 2026-05-19T07:57:42Z

-	perPod, err := fetchCountsPerPod(ctx, q.lggr, q.getEndpointsFn, q.interceptorNS, q.interceptorSvcName, q.adminPort)
+	fetchStart := time.Now()
+	result, err := fetchCountsPerPod(ctx, q.lggr, q.getEndpointsFn, q.interceptorNS, q.interceptorSvcName, q.adminPort)
+	if q.instruments != nil {


We could use the same pattern as in the interceptor instruments with NewNoopInstruments() and pass that into the components? This makes the code cleaner as we avoid adding a special case for instruments being nil.

Done — added NewNoopInstruments() and all tests now use it instead of nil. The nil check in fetchAndSaveCounts is removed.

linkvt · 2026-05-19T08:13:01Z

+	meterName = "keda-external-scaler"
+
+	// ServiceName is the OTEL service.name used for both metrics and tracing.
+	ServiceName = "keda-http-external-scaler"


We should probably also align the service name of the interceptor to use this naming scheme

Good point. The interceptor already uses keda-http-interceptor as its service name (set in interceptor/tracing/tracing.go). I think that's consistent with the scaler's keda-http-external-scaler. Should we rename the interceptor to something like keda-http-interceptor-proxy or is keda-http-interceptor fine?

linkvt · 2026-05-19T08:24:29Z

 	ProfilingAddr string `env:"PROFILING_BIND_ADDRESS" envDefault:""`
 	// StreamIntervalMS is the interval in milliseconds between stream ticks
 	StreamIntervalMS int `env:"KEDA_HTTP_SCALER_STREAM_INTERVAL_MS" envDefault:"200"`
+


I guess we could also deduplicate the config to ensure it is consistent across components?

Done — extracted MetricsConfig and TracingConfig into pkg/observability/config.go. Both interceptor and scaler use type aliases to it.

linkvt · 2026-05-19T08:27:51Z

+
+type metricsConfig struct {
+	OtelPrometheusExporterEnabled bool `env:"OTEL_PROM_EXPORTER_ENABLED" envDefault:"true"`
+	OtelPrometheusExporterPort    int  `env:"OTEL_PROM_EXPORTER_PORT" envDefault:"2224"`


Is there a reason to not use the same port we use for the interceptor metrics?

Changed to 2223 (same as the interceptor).

linkvt · 2026-05-19T08:29:27Z

+
+// RecordFetch records a completed pinger fetch cycle.
+func (i *Instruments) RecordFetch(duration time.Duration, endpointCount int, fetchErr error) {
+	attrs := api.WithAttributeSet(attribute.NewSet())


Can be removed as there are no attributes

linkvt · 2026-05-19T08:29:38Z

@@ -0,0 +1,81 @@
+package metrics


We should probably also add a test like the prometheus_test.go the interceptor has right now?

Added scaler/metrics/prometheus_test.go that verifies the histogram, counter, and gauge are correctly emitted.

…caler Add OpenTelemetry-based metrics and distributed tracing to the external scaler component, which previously had no observability instrumentation. Shared observability infrastructure is extracted into pkg/observability/ so both the interceptor and scaler (and future components) reuse the same tracing setup, metrics provider, and configuration types. Metrics: - scaler.pinger.fetch.duration (histogram) - scaler.pinger.fetch.errors (counter) - scaler.pinger.endpoints (gauge) - Prometheus /metrics endpoint on port 2223 (configurable) - Optional OTLP HTTP metrics export Tracing: - OTEL tracing SDK with console, HTTP/protobuf, and gRPC exporters - otelgrpc stats handler for automatic gRPC server span instrumentation - W3C TraceContext + Baggage propagation Relates to: kedacore#965 Signed-off-by: Mikhail Fedosin <mfedosin@redhat.com>

Fedosin · 2026-05-19T09:36:18Z

Thanks for the detailed review @linkvt — addressed all inline comments in the latest force-push (f8a351b):

deduplicated tracing + meter provider into pkg/observability/
deduplicated metrics/tracing config types (MetricsConfig, TracingConfig)
switched scaler metrics default port to 2223 (same as interceptor)
added defer os.Exit(1) + runtime.Goexit() pattern in scaler main
removed unused vars and empty attribute set
added NewNoopInstruments() and removed nil special-casing
added scaler/metrics/prometheus_test.go
updated PR description to use Part of #965 instead of Fixes

On your note about Helm/config resources: this repo currently has config/ manifests but no Helm chart directory. Since these env vars are optional and have defaults, runtime behavior is unchanged unless users set them. I can add explicit env var wiring to config/scaler/deployment.yaml in this PR too if you’d prefer that visibility.

Copilot AI review requested due to automatic review settings May 13, 2026 15:14

Fedosin requested a review from a team as a code owner May 13, 2026 15:14

keda-automation requested a review from a team May 13, 2026 15:14

Copilot started reviewing on behalf of Fedosin May 13, 2026 15:15 View session

Fedosin force-pushed the scaler-observability branch from b55975d to 6121c89 Compare May 13, 2026 15:19

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread scaler/queue_pinger.go Outdated

Comment thread scaler/metrics/provider.go Outdated

Fedosin force-pushed the scaler-observability branch 3 times, most recently from 744e536 to a4f8dbb Compare May 18, 2026 15:08

linkvt requested changes May 19, 2026

View reviewed changes

Fedosin force-pushed the scaler-observability branch from a4f8dbb to f8a351b Compare May 19, 2026 09:32

keda-automation requested a review from a team May 19, 2026 09:33

Fedosin force-pushed the scaler-observability branch from f8a351b to 9b6abde Compare May 19, 2026 09:34

Conversation

Fedosin commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

snyk-io Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

linkvt left a comment

Choose a reason for hiding this comment

Uh oh!

linkvt May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fedosin commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fedosin commented May 13, 2026 •

edited

Loading

snyk-io Bot commented May 13, 2026 •

edited

Loading

linkvt May 19, 2026 •

edited

Loading